-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destinations v2: Do not dedup raw table #31520
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
3a28809
to
0ae0e67
Compare
assertAll( | ||
() -> assertEquals("bar", actualRawRecords.get(0).get("_airbyte_data").get("string").asText()), | ||
() -> assertEquals("bar", actualFinalRecords.get(0).get(generator.buildColumnId("string").name()).asText())); | ||
assertEquals("bar", actualFinalRecords.get(0).get(generator.buildColumnId("string").name()).asText()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no longer any reason to assert the raw records, so only assert the final record.
dumpRawTableRecords(streamId), | ||
5, | ||
6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a behavior change; see https://airbytehq-team.slack.com/archives/C03C4AVJWG4/p1697486280303179
@@ -270,7 +272,7 @@ public void fullRefreshAppend() throws Exception { | |||
|
|||
runSync(catalog, messages2); | |||
|
|||
final List<JsonNode> expectedRawRecords2 = readRecords("dat/sync2_expectedrecords_fullrefresh_append_raw.jsonl"); | |||
final List<JsonNode> expectedRawRecords2 = readRecords("dat/sync2_expectedrecords_append_raw.jsonl"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no longer any difference between append and dedup raw records, so merge their expectedrecords files.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
confirmed via manual test that downgrading from this version to current master results in us re-deduping the raw table (i.e. rollback works as expected). (cc @jbfbell ) |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
destination-snowflake test report (commit
|
Step | Result |
---|---|
Build connector tar | ✅ |
Java Connector Unit Tests | ✅ |
Build destination-snowflake docker image for platform(s) linux/amd64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate metadata for destination-snowflake | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-snowflake test
destination-bigquery test report (commit
|
Step | Result |
---|---|
Build connector tar | ✅ |
Java Connector Unit Tests | ✅ |
Build destination-bigquery docker image for platform(s) linux/amd64 | ✅ |
Java Connector Integration Tests | ✅ |
Validate metadata for destination-bigquery | ✅ |
Connector version semver check | ✅ |
Connector version increment check | ✅ |
QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=destination-bigquery test
closes #30710
Stop deduping the raw table. This necessitates some changes to how T+D works overall. There are also a huge amount of test changes to fix the raw table expectations. (many of these are purely organizational - e.g. combining the dedup/nondedup raw expected records files, since they're now identical)
The final tables remain (almost) identical to previous behavior. The only difference is in how we handle typing failures in the
_ab_cdc_deleted_at
column. Given that connectors generate that column internally (i.e. we have perfect control over it) I don't think that matters.previous T+D logic:
where loaded_at is null OR deleted_at is not null
)where row_number != 1
)where raw_id not in final_table
)where raw_id not in (select from raw_table where deleted_at is not null)
)new logic:
(where loaded_at is null OR deleted_at is not null) AND row_number = 1
)merge
-based T+D implementation (where each existing final row must match at most one new raw row).where row_number != 1
)where deleted_at is not null
)cast
call is successful.(upcoming
merge
logic: replace steps 1, 2, and 3 with a singlemerge
statement that does an upsert+CDC deletes in one step)